Health-Related Hot Topic Detection in Online Communities Using Text Clustering

نویسندگان

  • Yingjie Lu
  • Pengzhu Zhang
  • Jingfang Liu
  • Jia Li
  • Shasha Deng
چکیده

Recently, health-related social media services, especially online health communities, have rapidly emerged. Patients with various health conditions participate in online health communities to share their experiences and exchange healthcare knowledge. Exploring hot topics in online health communities helps us better understand patients' needs and interest in health-related knowledge. However, the statistical topic analysis employed in previous studies is becoming impractical for processing the rapidly increasing amount of online data. Automatic topic detection based on document clustering is an alternative approach for extracting health-related hot topics in online communities. In addition to the keyword-based features used in traditional text clustering, we integrate medical domain-specific features to represent the messages posted in online health communities. Three disease discussion boards, including boards devoted to lung cancer, breast cancer and diabetes, from an online health community are used to test the effectiveness of topic detection. Experiment results demonstrate that health-related hot topics primarily include symptoms, examinations, drugs, procedures and complications. Further analysis reveals that there also exist some significant differences among the hot topics discussed on different types of disease discussion boards.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Dark Web Portal Overlapping Community Detection Based on Topic Models

A hot research topic is the study and monitoring of online communities. Of course, homeland security institutions from many countries are using data mining techniques to perform this task, aiming to anticipate and avoid a possible menace to local peace. Tools such as social networks analysis and text mining have contributed to the understanding of these kinds of groups in order to develop count...

متن کامل

Automatic Detection and Localization of Surface Cracks in Continuously Cast Hot Steel Slabs Using Digital Image Analysis Techniques

Quality inspection is an indispensable part of modern industrial manufacturing. Steel as a major industry requires constant surveillance and supervision through its various stages of production. Continuous casting is a critical step in the steel manufacturing process in which molten steel is solidified into a semi-finished product called slab. Once the slab is released from the casting unit, th...

متن کامل

An Online Hot Topics Detection Approach Using the Improved Ant Colony Text Clustering Algorithm

Recently, with an increasing number of major events spreading all over the Internet, the research for online hot topics detection system has been paid more and more attention. In this paper, we proposed an unsupervised and efficient hot topics detection approach, which is based on an improved ant colony text clustering (IACTC) algorithm. In view of the deficiencies of the basic ant colony text ...

متن کامل

Defining evaluation criteria for Health Information Systems using Human, organization and technology-fit factors (HOT-fit): systematic review

Introduction: The purpose of this study is to conduct a review of a series of published studies on evaluation of health information systems in order to determine the criteria of evaluation of hospital information systems using HOT-fit framework Information sources or data: The present study is a review study to evaluate articles of English databases PubMed, scupos and Persian databases Irandoc...

متن کامل

A Joint Semantic Vector Representation Model for Text Clustering and Classification

Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 8  شماره 

صفحات  -

تاریخ انتشار 2013